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Abstract 

In this paper, a general framework for the analysis of a connection between the training of artificial 
neural networks via the dynamics of Markov chains and the approximation of conservation law equa- 
tions is proposed. This framework allows us to demonstrate an intrinsic link between microscopic and 
macroscopic models for evolution via the concept of perturbed generalized dynamic systems. The main 
result is exemplified with a number of illustrative examples where efficient numerical approximations 
follow directly from network-based computational models, viewed here as Markov chain approximations. 
Finally, stability and consistency conditions of such computational models are discussed. 
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1 Introduction 

Many concepts and tools used in information theory, control, theory of approximation, dynamical systems 
and artificial intelligence are closely linked. Among such tools are neural networks. The success in using 
neural networks in many application areas is well documented in the literature. This includes, but not 
limited to, aeronautics, pattern and speech recognition (in particular in the context of various classification 
problems) [71 |Tl], optoelectronics, robotics and autonomous navigation, determining structure-property 
relationships in material science apphcations [iOlEH], constructing dynamic observers [4], identification and 
control of complex biotechnological cycles, industrial processes, and nonlinear systems in general [SI [H], 
modelling complex dynamic systems and nonlinear phenomena such as hysteresis [IT]. In many cases, a 
key to this success is kept by various associations of (computational) neural networks with the atomistic 
approach to molecular design aimed at determining some relationships between the properties of the structure 
(thermomechanical, electromagnetic etc) and the structure itself described at the microscopic (mesoscopic, 
and eventually macroscopic) level. Another set of tools and successful approaches to structure modelling 
is the macroscopic approach essentially based on the conservation law equations. In this contribution we 
developed a general framework for establishing a link between neural network concepts and numerical 
approximations of conservation laws. 
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One of the major objectives in neural network theory is the study of constructive approaches to the 
design of effective learning algorithms [15]. We observe that models for learning of neural networks and 
models for the evolution of dynamic systems are closely connected. The idea of using neural networks as 
components in dynamic systems is now well established in the literature (e.g., [30]). Models for the dynamic 
interactions between evolution and learning have been used by a number of authors who use evolutionary 
algorithms for the optimization of neural structures. It is often argued that artificial neural networks (ANN) 
and evolutionary algorithms (EA) can be efficiently combined together. Indeed, if ANN is interpreted as a 
model for biological nervous systems and the learning process, then EA can be interpreted as a model for 
biological evolution itself and the process of adaptation of large scales [3?]. In this case, ANN becomes a 
useful mathematical tool for linking deterministic and probabilistic approaches in the approximation theory. 
This idea was developed in [2ll[25] where the evolution was proposed to be modelled by a generalized equation 
with training/learning rules (algorithms) understood in a probabilistic sense. The problem of learning in 
neural network theory is formulated in terms of the minimization of an error function which is a function of 
adaptive parameters (weights and biases). In developing training/learning algorithms for ANN we have to 
deal with incomplete data, and an initial step of this procedure can be associated with the initializing of the 
weights in the network. Assigning these weights random values |T2], leads us to a probabilistic framework 
[39] . Postprocessing of the available information is required for deriving deterministic models. For example, 
given a (training) set of input-output pairs {u{i),yi) {i = l,...,n for n training/example subjects), we can 
"smooth" the data. The task is to construct a map which provides a good generalization (i.e. for given u that 
does not belong to this map should provide a reasonable estimate/prediction of the unobservable output). 
This is reducible to finding the best (functional) approximation to multivariate empirical (possible noisy, 
sparse, with some unobservable regions) data. However, if some additional "smoothing" constraints are used, 
we solve not the original, but a regularized problem. This provides a link between standard (Tichonov-like) 
approaches to deterministic, but ill-posed, problems and statistical (e.g. the Bayesian) approaches [9]. In 
this contribution we explore further this link between deterministic and probabilistic approaches in neural 
network theory by considering learning and evolution in an intrinsic dynamic connection. 

The starting point of our discussion is a formalization of the learning/training process in terms of the 
minimization of an error function based on the neural network approach. In particular, starting with a 
minimal structure (no hidden layers), according to certain rules, new connections, neurons, and layers are 
added. Most commonly used are related to a (probabilistic) adaptation of the network weights, a procedure 
which is intrinsically coupled with the overall network size (number of neurons on each layer and the total 
number of layers). Since the size and topology of neural networks are closely connected with the required 
training time, in dealing with the training/learning problem we will have to construct dynamic models. 
Such models arise naturally in applications of ANN to optimal control problems in particular in those areas 
where system parameters are time varying [22j. The approach developed in this contribution has common 
features with the on-loan evolution approach (previously discussed in the literature in the context of game 
theory and control problems [2J) in a sense that our procedure can be interpreted as an on-line evolution 
algorithm applied to a certain time-dependent equation. The type of this equation is motivated by the fact 
that network architectures can be established by approximating the dynamic programming equations [5]. 
In this paper, we also use a dynamic equation that corresponds to a certain neural network architecture and 
is derived from the framework of generalized dynamic systems (GDS) developed in [231 [2S]- 

In what follows, although we focus our attention on feed-forward-based networks complemented by 
backpropagation-like learning numerical procedures, the underlying procedure is the same for other archi- 
tectures of neural systems where we have to define the connections between layers, the parameters (e.g., 
initial weights) and some learning rules. We demonstrate that these connections can be established by 
combining "forward evolution" (using neural networks) and " backward evolution" (using Markov chains 
associated with the original process). This combination ensures the minimization of uncertainty in the fol- 
lowing sense. By processing information in response to discrete and continuous inputs, the ANN allows us 
to model dynamic systems. Since without a correction mechanism uncertainty may increase dramatically 



over time, we use Markov chains to construct an appropriate mechanism where the architecture of the ANN 
depends on the cone of macroscopic events [25]. Although our approach is completely different in principle 
from the classical Gelenbe approach (see, e.g. [3j), we note that in both cases one has to exploit the idea 
of neural-network random structures. Since the original time-dependent models are discretized in our ap- 
proach, one can interpreted the resulting scheme via interactions among neurons so it is possible to calculate 
probabilities of activation of network neurons in a way similar as it is done in the Gelenbe approach. 

Finally, we note that when applying ANNs to complex dynamic systems, one of the most important issues 
is to stabilize the learning mechanism [16] (e.g., one has to avoid the unbounded growth of the adjustable 
parameter in ANN). We show in this paper that this non-trivial task can be linked directly to discretizing 
conservation law models, and that the ANN stability can be treated effectively through the stability of 
numerical approximations of conservation laws, rather than through the Lyapunov stability (e.g. [21]). 

2 Systems Theoretic Framework for Constructing Neural Net- 
works and Dynamic Training Rules 

By applying neural networks to learning and identification of dynamic systems, we attempt to influence the 
behavior of these systems by some components built-in into the systems [301128]. Hence, the procedures for 
controlling such dynamic systems using ANN should be connected in one way or another with approximations 
of system dynamics. Moreover, since control, in addition to the requirements of fast and accurate, should 
ensure stability and robustness of the system, the stability of control is closely connected in such cases with 
the stability of numerical approximations of models describing the systems dynamics. 

Recall that in most typical situations a nonlinear dynamic system can be described from a systems 
theoretic point of view by the following set of equations 

r x{k + 1) = f[x{k), u{k)], /(O, 0) = 0, 
\ yik) = h[xik)], M0) = 0, 

where u{k), y{k) G and x{k) G M" are input, output, and state vectors, respectively, at discrete time fc, 
and mappings / : M" x R" and h : R" MP are given functions, while control based on the state 

vector and control based only on input-output data needs not to be the same [28]. Model (12. ip provide 
adaptive means for implementing digital neuro-controUed systems and for further insight into the neural 
learning and adaptation. We note that instead of model (12. ip . the dynamic system model can be determined 
through identification (e.g., with multi-layered neural networks) using the data obtained by a recurrence 
equation describing the output signal and taking into account the internal perturbations (which influence 
the output signal): 

y{k + l) = h{x{k + l),y{k),u{k),e{k+l)). (2.2) 

New output y{k + 1), determined from the state vector x{k + 1) by using the given input data [y{k),u{k)], 
will introduce error e{k + 1), representing internal perturbations at time k + 1. For example, it can mimic 
the effects of synaptic time delay representing the "memory" of neurons which may differ from connection 
to connection. Mapping h in (12.20 can be viewed as a perturbed function h. In principle, this equation, 
which combines the information given by model (12. ip is sufficient to describe the training procedure. From 
a numerical point of view such a nonlinear-control-framework consideration [28] can be viewed as an ap- 
proximation to the first equation in (12. ip . describing the evolution of states of the system, supplemented 
by an approximation of the "constitutive" law (the second equation in (12. ip ). describing some properties of 
the system, control rules, and/or requirements imposed by the user. Hence, if the original description of 
the system is set by using control-theoretic approach, model (12. 2p can be viewed as an approximation to 
the Hamilton- Jacobi-Bellman- type (HJB) equation describing some "energetic" characteristic of the system 



such as cost or value function [25l [26] . If the entire controlled process is modelled by the ANN, then the 
training/learning process for the neural network (using specific outputs with each of several inputs) can be 
associated with the dynamics of a certain conservation law model that describes the evolution of the non- 
linear system. In this case, the inputs of ANN are associated with the initial conditions of the conservation 
law model. The description of the output-input training process will be understood here as a constructive 
backpropagation based on a process-associated Markov chain approximation. This approximation will be 
linked to numerical approximations of conservation laws. 

Following [28], we consider a neural network as a conveniently parameterized class of nonlinear maps. 
In the next paragraphs we develop the methodology for linking this class of maps to conservation law 
approximations. In this context it is important to note that when a network is chosen to approximate a 
given mapping using 10 data, the 10 set is finite, whereas a conservation law defines an infinite set of input- 
output data. However, once such a conservation law model is approximated the process of neural network 
training and the approximate conservation laws can be described within the same general framework based 
on the concept of generalized dynamic systems [2H ES] • In what follows, our consideration will be pertinent 
to discrete-time (even though the original system might be continuous in time), which reflects the fact 
that most of complex systems are controlled by computers and therefore it is quite natural to consider 
approximations to complex dynamic systems as being discrete in time |3T] . Moreover, any model, no matter 
how sophisticated it is, should account for uncertainties of the environment [231 [25] which are easier to deal 
with in the discrete time formulations |3lj- Furthermore, the stability issues are much more transparent in 
the discrete space-time of events where dynamic systems evolve. Finally, in the context of neural networks 
the discrete-time consideration is the most natural way to proceed with the analysis because neural networks 
are proved to be universal approximators and any continuous function can be approximated arbitrarily well 
on a compact set with a (multilayer feedforward, radial basis function, or another) neural network (e.g.. 



3 Approximating System Hamiltonians with Neural Networks 



If the dynamic system under consideration is Hamiltonian, the problem of training neural networks with 
dynamic system rules governing the evolution of this system requires efficient procedures for approximating 
the Hamiltonian of such a system by a neural network. As a starting point, let T be a given set of times during 
which the network is trained, S is a state space of the dynamic system, Ut is a set of all permissible strategies 
for training, and Xt is the domain of definition of the system Hamiltonian assumed to be a compact Borel 
set [13]. Further constructions are based on the following fundamental property of applications of neural 
networks in the theory of approximation [T^ IT9| [6] : 

Theorem 3.1 If H & h^lX^), then for any arbitrary small e > there exists a network H such that 



For the description and control of complex dynamical systems via neural networks both feedforward 
(FNN) evolutionary networks (for the modelling of nonlinear input-output dynamics) and feedback evolu- 
tionary networks (for the implementation of training algorithms) are required. The link between those can 
be established on the basis of the Markov chain approximations as follows. First, recall that if is a con- 
tinuous function we can construct a uniform approximation (e.g., by using a FNN [mUH]). Note also that 
if smoothness of the function H is measured in terms of its Fourier representation an actual estimation of 
the network performance can also be obtained (e.g., [6]). Therefore, it is natural to associate a FNN model 
with the approximation H of the system Hamiltonian (for simplicity, with one layer of sigmoidal nodes or 



H - H\\i,^Xt) < ^- 




units). In MP this model can be implemented by the following functions 

n 

Hn{^) = ^ aiXtiy ■ X + Pi) + ao (3.2) 

i=l 

with X, y G M", a^, /3j G M. The point that we are making here is that such models alone may be of limited 
applicability when applied in uncertain dynamic environments, because the quality of network performance 
is dependent strongly on its training capabilities with respect to the regularity of H and the structure of 
Xt- In considering computational models of system dynamics the structure of the network-approximator 
like (13.21) cannot be considered to be fixed, but rather it has to be adapted to the change of environment 
and internal perturbations. As a consequence, the most appropriate approximation in the framework of 
(13. ip can lead to a situation where activation function xt may not be continuous, subject to regularity of 
H. Furthermore, even if this function is continuous, in order to account for the topological structure of Xt 
each network layer has to be characterized by both feedforward and feedback operators. 

Remark 3.1 In a special case, where Xt = and hence H : M^^ (TV, M G Nj, such operators may 

be interpreted through weight matrices JE^. 

The situation we described above is of the same nature as in control theory where a regularity balance 
between control and value functions has to be achieved |23l|24]. In the context of our framework, this balance 
is achieved by simultaneous computational treatments of the "feedforward" and "backward" evolutionary 
processes, which can be effectively implemented by using along with feedforward approximators feedback 
networks such as block feedback networks (BFN). Below we explain this idea in detail. 

Consider Hamiltonian H mapping Xt into and its "computational equivalent", that is a Turing 
computable function Ht such that 

H -.Xt^ M^^ and Ht-.N-^N. (3.3) 

Then, since BFN models have the same computing power as Turing machines (e.g., [35]), we conclude that 
there exists a BFN Ht such that for any input n from a subset of N a finite number of network steps 
produces HT{n). Therefore, due to the Godel numeration procedure (e.g., ^10] and references therein), any 
Hamiltonian function H : M*^ can be arbitrary well approximated by a network implementation of 

function Ht- Since both feedforward H and feedback Ht networks provide an approximation to H, the 
development of constructive algorithms for training requires further study of the connection between H 
and Ht- This connection between the concepts pertinent to feedforward and feedback networks is often 
overlooked. Examples where this connection becomes transparent are provided by associative memory 
networks or recurrent neural networks, where feedforward connections should be supplemented by some 
"memory" neurons or synaptic time delay, which may vary from connection to connection. A natural way 
to establish such a connection is to construct a Markov chain associated with the system evolution in such 
a way that it reflects the process of network training on the problem specific information. 

Now, we are in a position to formulate the problem of training in terms of approximating the system 
Hamiltonian in such a way that the necessity of applying both forward and backward dynamic rules is 
transparent. For this purpose we consider a relatively simple case, where we assume that 

: M ^ S (3.4) 

is a sigmoidal function, meaning that 



lim Xt = 0, lim Xt = 1. 

t— >— oo t— »+oo 



(3.5) 



The function xt here is the activation function for a neural network defined by its neurons as the following 
mapping 

o /i : T ® S ® f/r ^ S, (3.6) 

where /i:T(8)S(8>t/T— >Mis known as the decision maker (DM) function [25] such that it has to be 
adjusted (trained) so that the neural network (13.61) leads to an approximation of H. If the network (13.61) 
is trained with some dynamic rules by using a dynamic system whose Hamiltonian is H, then the process 
of approximation of function H in terms of (13. ip can be seen as the construction of a training strategy for 
a new network depending on the arbitrarily small positive parameter e and arbitrarily large number of 
sigmoidal nodes of the associated network n (e.g., ( 13. 2p ) so that 

11^ - ^nllLi(XT) ^ min, (3.7) 

where H G L^(Xt). The main difficulty in the solution of problem (13.71) stems from a priori unknown 
character of dependency of the network on parameters which determine the function /i. In the most general 
setting, the problem of constructing the mapping n is intrinsically connected with the definition of dynamic 
rules in singular stochastic control problems [25 ; interpreting (13.71) one expects that 

if e 0+ and n ^ oo then H'^ H. (3.8) 

However, for the constructive solution of (13.71) by using neural networks we need additional information. For 
example, if FNNs are applied to the solution of the problem, we need information on a coupling rule between 
e, n, and the topology specified by Xt- This couphng rule will determine conditions for the system stability. 
If BFNs are applied to the approximation of H on an arbitrary set Xt, we need additional information on 
the network dimension and architecture [35]. This information can be made available only in a sequential 
manner, and the appropriate tool for the analysis of the network performance in such situations is a family 
of Discrete Markovian Decision Processes (DMDP). In this case a model for the training process, written in 
terms of the decision maker function fi, depends on Markov Chain parameters with possible discontinuities 
that are dependent on values of the sigmoidal function xt- If the functional dependency of fi is chosen a 
priori, this may lead to the underestimation of possible irregularities of the function H which can happen if, 
e.g., smoothness of H is measured in terms of its Fourier representation [6l |27]. In the next section we aim 
at constructing a model that gives an approximation to fi with respect to some learning rules determined 
from the Markovian property of the process {xt,fi). 



4 Modelling Dynamics of Network Training 

From a constructive approximation point of view, the training process of neural networks describing dynamic 
systems evolution can be seen as the solution to the following problem. We have to construct a model 
(possibly, a hierarchy of models [23]) for the decision maker function /i in such a way that a neural network 
approximates the Hamiltonian of a dynamic system evolution, while the dynamics of this evolution is 
described by the process of network training. 

Any two sets of input-output data in the training process can be represented by the mathematical model 
of a dynamic system that couples two space-time events of the system evolution, e„ and Cn+i, by a function 
of the perturbed velocity and the system Hamiltonian or its approximation H: 

Cn+i = H{ve,en), n = 0, 1, ... (4.1) 

The perturbed velocity is introduced to account for the changing environment and/or internal perturbations 
[25]. Then, if we specify a sequence of events (cq, ei, ...) by temporal evolution and formalize the dynamics 
of the system by a discrete-time model, we obtain the following two equations [25] 



f xt+i = Hi{vi,xt), 

\ hr+l = Ho{vo, hr). 



(4.2) 



Figure 1: A typical unit of network architecture 



where Hi is an approximation to H and vi = hm Ve, Hq is an operator for sequential corrections of such an 

approximation needed due to system external/internal perturbations and vq is the function characterizing 
the rate of such perturbations. The dynamics of the system described by (14.21) operates on two coupled 
temporal scales, t and r with spatial trajectories described by Xt and hr, respectively. Since perturbations 
(internal and/or external) is an intrinsic part of any dynamic system, we associate the class of neural networks 
with built-in training capability (determined by models for n) as Infinite Length Perturbed Markov Chains 
(ILPMC) [25j. Since the form of the functional dependency of n cannot be a priori fixed, the weights and 
the network structure may change which leads to a situation much more complicated than traditionally 
dealt with (see Fig. [1]). Some algorithms dealing with such a situation are known (e.g., cascade-correlation, 
pruning algorithms, etc). However, our approach is different from the previous developed since it is based 
on the Markovian character of the process {xt,n), associated with both system dynamics and the process 
of training, and the concept of generalized dynamic systems (with appropriate assumptions on dynamic 
stochastic rules [25]). In our approach, by using conditional probabilities of the Markov chain, the velocity 
function between two macroscopic events in the evolution of the generalized dynamic system is introduced 
as a measure of changes which take place on the microscopic level with respect to the macroscopic behavior 
of the system. This velocity function provides a link between microscopic and macroscopic models for the 
evolution of dynamic systems. 



Similar to model (3.7), models (14.11) and (14. 2 p are interpreted in terms of the limit e ^ 0^ and n oo, 
and the connection between them can be analyzed when learning rules for the system evolution in response 
to perturbations are introduced. In particular, in the limit (13. 8p both sequences (xq, Xi, ...) and {Hq, hi, ...) 
merge. As an important part of the system dynamics, perturbations can be formalized from the very 
beginning of the modelling process as a decision making process with limited information available. 

Remark 4.1 Since perturbed and unperturbed models might give rise to qualitatively distinct types of de- 
scriptions of system behavior for any arbitrary e > 0, in it was already emphasized that the perturbation 
parameter alone cannot be an appropriate characteristic of the model's uncertainty, and this fact results in 
two informational sequences in model ( [^.i^ considered on two different time scales. 

The learning rules are introduced into the model by the equation written in terms of function /i (Section 
3). In particular, following [251 [26], we arrive at 

Proposition 4.1 If H E h^{XT), then the process of training/learning can be associated with the following 
equation: 

where, in the context of neural networks, Vi is the velocity of information transmission between neurons, 
and fo is a training/learning goal defined by a priori knowledge/assumptions on Xt and the function H. 



(4.3) 



This process can be approximated by an appropriately constructed Markov chain associated with the 
evolution of the dynamic system under consideration. Since function /i is required to be adapted to a priori 
given information, the practical implementation of training procedures may lead to approximations of the 
network-activator functions by a piecewise-deterministic stochastic process. Such non-diffusion stochastic 
models have been previously studied in theory of DMDP and in theoretical physics (see [25] and references 
therein). Such models are multiscale models for the evolution of dynamic systems described by (14.21) and, 
as pointed out in [23|, can be interpreted by a system of coupled differential equations on two scales: 

/i(r) = vo{T,h,ij,), 
--F.,(t,x,/.)- = 0. 

We conclude this section with the following observation. 

Remark 4.2 Both parts of the perturbed velocity functions vq and vi inherit their dependency on the 
decision-maker function fi. If two events between which GDS evolution has to be studied are specified (e.g., 
two sets of input- output data are entered), then a pair of functions {h^r), ^(t,x)) gives the solution to the 
training problem under consideration. 

Next, we describe a procedure that allows us to approximate this pair. 

5 Network-Based Computational Models for Dynamics as Markov 
Chain Approximations. 

Although the perturbed system dynamics xl might not be governed by the Markovian property (and the 
function xl might not be sigmoidal), the pair of functions {h{T), fi{t, x)) does possess the Markovian property 
[25] . Therefore, the key idea here is based on the construction of a Markov chain approximation simultane- 
ously with an approximation of the system dynamics (which depends on Markov chain parameters). This 
allows us to guarantee system stability and to derive stability conditions in explicit form. The Markov chain 
will play in our constructions the role of a "training/learning" rule for the system dynamics considered in the 
case where the perturbed system's velocity is replaced by its approximation (function vi) in the macroscopic 
(or decision maker's) frame of reference. 

More precisely, we will approximate the pair of functions (h^r), fi{t,x)) (which describes the process of 
GDS evolution and possesses the Markovian property) by a pair of discrete functions 

(Mr),/i(t,x))^(e,/^;'), (5.1) 

where is an associated (with the microscopic frame of reference) Markov Chain state. 

First, we consider model (14.11) describing the evolution of dynamic systems in discrete space-time of 
events. Let an elementary space-time cell be Cij and the complete spatio-temporal region, where the systems 
dynamics is studied, be G such that two events 

Cj, Cj+i G Cij = [xi,Xi+i] ® [t^,t^^^] C G (5.2) 

of system evolution are governed by the process {xt,fit)- Then we specify these events by two pairs of 
discrete functions 

e, = (ef ,/i(x,,t^)), e,+i = (ej^i,/i(x,+i,t^+^)), (5.3) 
where ^J^ = x{ and ^J^;^ = xj^l are states of the associated Markov Chain. 



Next, we consider the perturbed generalized dynamic system whose training/learning dynamics can be 
described by the following equation, obtained from (14.31) or (14.41) under appropriate assumptions 

— + vi{t,x,n)— = fo{t,x,fx) (5.4) 

with the approximation of the initial condition in the DM-time scale 

n{x,t)\t=to = 5{e), (5.5) 

where 6{e) is a set of initial conditions with e dependent on the approximation of the function vq in the 
coupled system (g^D (or flOD ). 

To preserve basic macroscopic features of the system, the values of jumps A^j'' = ^J^^ of this chain 
should be subordinated to the corresponding approximation of system-environment boundaries. This can 
be done via establishing the connection between the GDS and perturbed GDS. 

Definition 5.1 Let / 1 5.^)) . liS. 3\) be two subsequent macroscopic events of GDS evolution that are taking 
place with probability 1. Then the GDS velocity function between the macroscopic events Cj and Cj+i can be 
defined in an elementary space-time cell Cij <Z G as 

f(t,a;) = lim —, (5.6) 

T-+0 T 

where the numerator under the limit in l{5.6\) is the velocity of the Markov Ghain (vmc) between two subse- 
quent macroscopic events. 

We observe that 

Remark 5.1 // lim = Vi then 

lim v{t, x) = f 1, (5.7) 

e— >0+, n— >oo 



and together with (4jJ_) this defines an Infinite Length Unperturbed Markov Ghain (ILUMG). 

Since perturbations cannot be ignored, (15. 7p can be fulfilled only approximately, which leads to the consid- 
eration of an ILPMC, the concept already mentioned in Section 4. 

Before formulating consistency conditions for the Markov chain associated with the dynamics of training 
rule (ED, (ESD we recall p5] that 

Definition 5.2 The state space in the initial moment of observation in the macroscopic frame of reference 
is defined as 

S(2;0) = {xi,2 = 0,l,2,...,iV; iV = 2n, n = [(T - to) A] }, (5.8) 

and subsequently the cone of macroscopic events of system evolution is defined by a set of macroscopic events 
as a mapping from the minimal resolution set for the identification of all macroscopic events relevant to the 
system evolution in the limit n ^ oo to ^ S(z; j), where 



'^{^j) = {{.Xi.tj), i = k,2n- k, j = k, k = 0,n}. (5.9) 
Now we are in a position to formulate 



Proposition 5.1 The consistency conditions (local and global, respectively) of the Markov Chain ^J^^, n < 
oo with the Markov process (/i(r), x)), defined by the mathematical model of CDS evolution ^5.4^ , \5. 5]) . 



are: 



E^'^^^^^-^^'^Aa" = vi{x,,t^,fi^)r + o{h + T) (5.10) 



cov. 



Th\{Xi,^l^) 



Aef = o(/i + r). (5.11) 



Using these ideas and following [25] , a simple approximation to (15. 4p , (15.51) can be constructed and in the 
next sections we generalize this approximation to a large class of schemes for approximating the dynamics 
of training rules in neural network applications. Let us first highlight the main steps for the construction of 
the scheme proposed in [25] . 

• In the cone of macroscopic events we introduced a floating grid: 



cc;^ = {(xi, '), i = k, 2n - k, j = k, k = 0, n}, (5.12) 

where tj-'"^ = + Tj_i when j > 1, = t° + r when j = 1, and tj^"^ = t° when j = 0, where we 
constructed the following discrete scheme (based on upwind approximations with flux limiters [33] ) 

dt' = di{l-^[\v\+v--f,-v+-f^]} + ^dU{y{l+-f2)+v--f,]} + 

ldUi{[v'{l - 73) - ^+7i]} + ldl,{[-v^^,]} + ldl^,{v-^sh (5.13) 

where ci is a discrete function approximating function fi on grid (15.121) . 

It is easy to see that the sum of all coefficients near the unknown function on the right-hand side (15.131) 
gives the unity. This fact allows us associate these coefficients with transition probabilities of a Markov 
Chain, provided those coefficients are nonnegative: 

l-l{\v\+v-^^-v+^i)>0, 72 <0, 73 >0, 

(5.14) 

^-+(1 + 72) + ^^"74 > 0, t;-(73 - 1) + t^+7i < 0. 

The consistency conditions (local and global, respectively) of the Markov Chain (defined by time- 
transitions of the discrete scheme (15.131) ) with the process (/i(r), /i(t, x)) (defined by the model (15. 41) . (15. 5p ) 
couple flux limiters of the scheme 

r[f "(1 - 74 + 73) - ^^(1 + 7i - 72) -v] = o(r + h), (5.15) 
T{hy{l - 71 - 372) + t;"(l + 74 + 373)] - rvlc} = o{t + h), (5.16) 

where the Markov chain velocity in this case is determined as vmc = ^^~(1— 74+73)— "i^"*" (1+71—72)5 and 
o is the Landau symbol. Using the idea of probabilistic characteristics the term o{t + h) in (15.151) and 
(15.161) can be eliminated. The nonnegativeness of covariance leads to an additional stability condition 



r ^ v+{l - 71 - 372) +v (1 + 74 + 373) 

h ~ [t;-(l +^3 -^4) -72)]^' 



which can be satisfled under appropriate choice of the flux limiters. As a result, we have arrived to the 
stable Markov chain approximation of the training process. 



More precisely, if the interpolation interval r is such that conditions fl5.14p and fl5.17p are satisfied and 
the transition probabilities of the Markov Chain (^^^, n < oo) are taken in the form 



V 74-t;"^7iJ, 



h 



+ 72) + 74], 
0, 



k 
k 
k 
k 
k 



Vj = 0, 72 — 1 and i = j,N — j (72 = for i = j and 73 = for i 
approximation of the process (h^r), fi{x,t)) is stable, and 



z-1, 
i + l, 
i - 2, 
i + 2, 



(5.18) 



otherwise, 
— j), then the Markov Chain 



k 



(5.19) 



A numerical procedure like (15.191) is an explicit (evolution forward) stabilization procedure where the 
DM-function is a stabilizing factor subject to the velocity of the system. Moreover, when n 00 the velocity 
of the Markov Chain converges to the velocity of the process in the sense of the Markov theorem (e.g., |25j). 
This puts the approach used here on a rigorous theoretical basis. In the next section we show that this 
approach can be generalized to an important class of conservation-law-based models for the training process 
of neural networks. 



6 Network-Based Probabilistic Approach to Conservation Law 
Approximations 

The network-based methodology developed in the previous sections can be applied to conservation law 
approximations. In what follows we demonstrate this on a number of practical examples where we show 
that many efficient finite difference schemes for the approximation of conservation law equations follow 
directly from our technique. 

Consider a model analogous to that of training/learning dynamics for perturbed CDS, but with fi —>■ u, 
Vi — > a, and /o = 0: 

— + a(t,x,u)— = 0, (6.1) 

where u is some physical quantity of interest, and a is the velocity related to the transport of that quantity. 
With the Cauchy initial condition 

m(x,0) = Mo(a;), xgM (6.2) 

this equation gives a model of a physical system without dissipation in the form easily amenable to a 
nonlinear (quasi-linear) conservation law. Indeed if a{u) = F'{u), then model (16. ip . (16. 2p is equivalent to 

du dF 

^ + ^ = 0, u{x,0) = Uo{x), xeR. (6.3) 



This model and model (15. 4p . (15. 5p describing the dynamics of network training belong to the same class 
of mathematical models, and such models provide building blocks for many mathematics applications in 
science and engineering. One of the sources of difficulties for numerical solutions of such problems is that in 



many practical situations the solution to (16. 3p might not be continuous, and one has to deal with possible 
discontinuities across certain curve x = (T{t). In other words, if 



lim u{x,t)=UMt, lim u{x,t) = Urigu and wicft 7^ Wright, (6.4) 

one should be able to select a physical meaningful solution from a set of all possible solutions (since two or 
more classical characteristics in this case might pass the same point). In particular, considering the Cauchy 
problem (16. ip . (16. 2p . we note that its solution should be understood in the generalized sense, where both 
the Rankine-Hugoniot condition [18j 

M| - l/l. (6.5) 

where [■] denotes a jump across x = a{t) (i.e. [u] = bright — wieft), and the entropy condition 

a(Mright) < ^ < a(Mieft), (6.6) 

meaning that the entropy increases as material crosses discontinuity, should be satisfied. Under these 
circumstances the development of constructive procedures for efficient numerical schemes for the solution 
of (16. ip . (16. 2p becomes a very important task in theory and practice of conservation laws, as well as in a 
number of related fields. The stability of such schemes is at the heart of the success in achieving this task. 
Although viscosity solutions could provide some insight into these problems, they would not provide details 
on the solution stability. Hyperbolic partial differential equation (PDE) models are models for information 
propagations while probabilistic approaches imply diffusion. However, since such models need to be solved 
numerically, we will demonstrate that the stability conditions, derived in our approach in a straightforward 
manner, coincide with the stability conditions typical for numerical approximations of hyperbolic PDEs. 



6.1 Neural networks in approximating conservation law models 

There is an intrinsic analogy in the information flow pattern for neural networks used in applications (e.g.. 
Fig. [T]) and information flows represented by stencils of numerical approximations of conservation laws (e.g.. 
Fig. [2]). Consider flrst some basic classical schemes for numerical approximations of (16. ip . We start from 
the forward centered Euler scheme 

= ul- ^K+i - (6.7) 

The flow of information for these schemes is given in Fig. [2] (a) and it can be seen that in the terminology 
of neural networks the stencil for this approximation (as well as for an improved approximation 
|(iti+i + does not contain any hidden layers. The situation becomes more involved for the Leap-Frog 

scheme (see Fig. [2] (b)), where several layers of network architecture are coupled 

This coupling is sequential in nature, and hence our consideration is pertinent to such networks architectures 
where the layers are sequentially linked. From a numerical point of view such architectures can be represented 
by the predictor-corrector-type schemes. One of the most effective practical tool in numerical approximation 
of conservation laws that reflects this architecture is the Lax-Wendroff scheme 

^r = «i-A(i^-i^i^), where = ^an^+i + ^a^(u^+i - ^D- (6.9) 



J 



Figure 2: Network architectures and information flow chart for numerical approximations of conservation 
laws: (a) forward centered Euler, (b) leap-frog, (c) Lax-Wendroff. 

As it is obvious from Fig. [2] (c), this scheme includes "hidden" layer j + 1/2 where one has to determine 
fluxes before moving to the next network layer. In what follows we will show that this scheme and many 
other effective schemes for numerical approximations of conservation laws can be derived from the general 
probabilistic approach based on the association of the conservation law models with Markov chain training 
processes for some neural network architectures in a way similar to that described in Section 5. Moreover, 
the interpretation/association of the stencils of these schemes with neural network architectures will allow 
us to deal with the stability conditions in an efficient manner by using the consistency conditions of Markov 
chains with the original process. 



In the spirit of Section 5, the architecture of neural networks is associated with the cone of macroscopic 
events. From the numerical point of view this is equivalent to the definition of the following points 

[(x„C^),^ = 0,...,2n] [(x„t[°),z = l,...,2r2-l] ^ [(a;,, t^^), z = 2, 2n - 2] ^ ... ^ 
[{xi,f^~^'),i = n-l,n,n + l] [(x^, C"^), i = n] (6.10) 

in the cone of macroscopic events, where the approximation of the evolution of dynamic systems described 
by the conservation law takes place. An important point to emphasize is that in this general framework the 
stability conditions for the associated schemes can be obtained in the explicit form. Consider, for example, 
scheme flS.lSp , (15.191) for the solution of homogeneous problem (15. 4p , (15. 5p . First note that if the following 
condition 

^+(1 _ _ 3^2) + t;-(l + 74 + 373) = [v-{l + 73 - 74) - v+{l + 71 - 72)]' (6.11) 

is met, then the Courant-Friederichs-Lewy-type (CFL-type) stability condition (15.171) is satisfied [2S]. Since 
either v~ or f + is zero while the other is non-zero, we consider 2 cases. Our aim in this example is to 
derive explicit expressions for the fiux limiters. In its turn, as follows from our discussion in Section 5, this 
will define stability conditions of the associated scheme and will complete the definition of the transition 
probabilities of the associated Markov chain. 
1. If f ~ = and f + 7^ then from (16.111) we have 

1 - 71 - 372 = (v^yil + 71 - 72)'. (6.12) 

This leads to a quadratic equation with respect to 71 

v+^l + [2v+{l - 72) + l]7i + [v'-il - 72)' + 372 - 1] = 0. (6.13) 



j-2 



j-1 



j+1 



j+2 



j-1/2 



j+1/2 



Figure 3: Flux limiters in the probabilistic approach for conservation laws 



In a particular case where 72 = the solution to fl6.13p can be easily found 

-{2v+ + 1) ± V8v+ + 1 



1,2 
7i 



2v+ 



Since 71 should be non-positive (see fl5.14p ) we take the sign "minus" in fl6.14p . leading to 

, V8v+ + 1 + 1 



7i 



A similar reasoning leads to the expression for 71 in the general case of (16.131) 



7i 



1,2 _ -[2v+{l - 72) + 1] ± V8v+ + 1 - 16v+^2 



and finally to 



7i 



-1+72 



2^ ■ 



(6.14) 



(6.15) 



(6.16) 



(6.17) 



2. Along the same vein we can obtain the expression for 74 in the case where = and v 7^ as a 
solution to the equation 



1 + 74 + 373 = (1 + 73 - 74) 
which can be written in a form easily amenable to its solution with respect to 74 

v--fl - [2v-{l + 73) + 1]74 + b~(l + 73)' - 1 - 373] = 0. 
The solution to the last equation is determined as 



74 = 1 + 73 + 



^/8v- + 1 + 16v--f3 + 1 
2v- 



(6.18) 



(6.19) 



(6.20) 



(the sign "plus" was taken due to the non-negativeness of 74, see fl5.14p ). In the case 73 = fl6.20p is reduced 
to the expression obtained in [25] . 



This probabilistic approach based on the association of conservation law approximations and the neural 
network training process is amenable to the treatment of dissipative systems by using the framework of 
perturbed generalized dynamic systems developed in [25] (see also references therein). However, to make 
our basic concepts transparent to the reader, we concentrate in this paper on dynamic systems without 



dissipation. Since the representation (14.11) of dynamic system (and its consequences) from a numerical 
analysis point of view is an explicit scheme, we recall [33l [20] that a family of explicit schemes for the 
solution of (16. ip . (16. 2p can be written as 



u 



n+l n 



(6.21) 



where ^"_|_i/2 — ^('"j^'^j+i) is the numerical flux that can model hidden layers in the network architecture 
associated with the given scheme. 

We propose the following general approximation of the flux at Xj + 0.5r and Xj — 0.5r, respectively 



i+2 



hj+1/2 = ^ h{r, h, Vk)uk = bj^iUj^i + bjUj + bj^iUj+i + bj+2Uj+2, 
k=j-i 



(6.22) 



hj_i/2 = ^ 6fc(r, h, Vk)uk-i = bj-iUj^2 + bjUj-i + bj+iUj + bj+2Uj+i, 
k=j-i 



(6.23) 



where all coefficients in fl6.22p and fl6.23p are velocity-dependent approximations of the flux limiters, for 
example, bj^i = 6j_i(r, h, f j-i), = bj_i{T, h, fj_2), etc. Then, the general family of the explicit schemes 
can be written as follows 



- X[-bj^rUj-2 + {bj^i - bj)uj^i + {bj - bj+i)uj + {bj+i - bj+2)uj+i + bj+2Uj+2] 



-bj-iUj-2 - ^ {bk- bk+i)uk + bj+2Uj+2 
k=j-i 



(6.24) 



Note that this technique is applicable to the case where bm = bm{T, h,ip{vm-i,Vm)), e.g. ip{vm-i,Vm) = 
6.2 Special Cases and Examples 

Now, we demonstrate our methodology on a number of examples and show that most efficient schemes 
for numerical approximations of conservation laws become just special cases of our general network-based 
approach. 

Example 1. First of all, it is easy to confirm that by choosing the flux limiters in the forms 



= 0, - bj = -ia, bj - bj+i = 0, bj+i - bj+2 = ^a, bj+2 = 0, 

we obtain the forward centered Euler scheme for which the numerical flux is deflned as 

Example 2. If we take now 

b^-i = 0, -A(6j_i - b^) = i + |a, 1 - \{bj - bj+i) = 0, 
-X{bj+i - bj+2) = I - f a, bj+2 = 0, 
we arrive at the following result 



(6.25) 



(6.26) 



(6.27) 



Therefore, we confirm that under the above choice of the flux limiter coefficients, we obtain the Lax-Friedrichs 
scheme 



^ 2 

which can be written in the form (16.211) with 



h 



■i+1/2 



Example 3. By choosing the flux limiters in the general architecture ( ]6.2ip - ( 16.23^ as 



we also confirm that 



bj-i = bj+2 = 0, bj^i-bj = — - ^\a\, bj - bj+i = \a 



1 |a| 1 \a\ 

bi = —a H , o,-4.i = —a 

^ 2 2 ^ 2 2 



(6.29) 



(6.30) 



(6.31) 



(6.32) 



It can be seen that this leads to the (upwind) Euler uncentered scheme, representable in the general form 
( lOTD with 



hj+i/2 = ^ [a{uj+i + Uj) - \a\{u - Uj] 



Example 4. Taking 



1 Xa^ 



bj-i = 0, bj = 2« + — ' ^i+i = 2^~ ~Y' ^^^^ ^ ° 



(6.33) 



(6.34) 



in the general formulation (16.211) - (16.231) leads to the one of the most popular schemes for the numerical 
approximations of conservation law (16. 3p . the Lax-Wendroff scheme, discussed briefly earlier in this section, 
where 



hj+i/2 = -^[aiuj+i + Uj) - Xa'^{uj+i - uj]. 



(6.35) 



Example 5. Finally, we note that the family of schemes proposed in [25] and briefly reviewed in Section 5 
is also a subset of the general representation given by (I6.2ip - (I6.23p . Indeed, if we take 

bj-i = V, bj^i - bj = -v+{l + 72) - f "74, bj - bj+i = \v\ + y-'j^ - f +71, 
_bj+i-bj+2 = -v'{l -'^3) +v^li, bj+2 = V-'j3, 

(as before, 7^ are flux limiters of the scheme determined from the stability and consistency conditions), we 
can determine flux coefficient bj+i in two different ways. First, "moving from the left" (see Fig. |3]) it is easy 
to obtain the following chain of relationships: 

bj = f +(1 + 72) + v~^4, bj+i = bj - \v\ - i;~74 + f+71 = f +(1 + 272 + 71) - \v\. (6.37) 

On the other hand, "moving from the right" (see Fig. [3]) we get that 

bj+i = v~-f3 - f "(1 - 73) + t;+7i = v'{2j3 - 1) + v^ji. (6.38) 

Therefore, by equating two expressions for bj^i we arrive at the conclusion that 

v+{l + 2-f2) +v-{l-2-f3) = \v\, (6.39) 

which can be satisfied by setting 72 = 73 = 0. Having these flux limiters, 71 and 74 can be controlled by 
satisfying the stability conditions [25j. 
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Figure 4: A schematic representation of flux limiters as functions of velocity 

7 Stability and Consistency Requirements for Network-Based 
Approximations of Conservation Laws 

In the previous section we derived numerical approximations for conservation laws using the general network- 
based methodology. As we have discussed in Section 4 this methodology originates from the probabilistic 
foundations of the dynamics. Indeed, having approximate initial data (e.g., as a result of measurements) 
we attempt to extrapolate this data further in time. This procedure is a subject of a probabilistic error. 
Therefore, it is important to discuss further stability and consistency requirements for the schemes obtained 
earlier in this paper, in particular, for the general family of approximations (16.211) - (I6.23p . 

The probabilistic approach to conservation laws allows us to explain a number of important phenomena 
related to numerical approximations of these models in a straightforward way. First, it is easy to check that 
the sum of all coefficients near the unknown function in every scheme discussed in the previous section is 
1. However, this fact alone might not be sufficient to interpret those coefficients as transition probabilities. 
Indeed, the nonnegativity requirement plays a key role in the possibility of such an interpretation. From 
a numerical point of view this requirement leads to the stability conditions of the scheme. If we take the 
forward Euler /centered scheme this requirement leads to Xa/2 < 0, which can be satisfied only if a < 
(A = r/h). However, since we use here a forward scheme with a stencil depicted in Fig. [2l this should not 
come as a surprise, because for a > the explicit scheme on this stencil is known to be unstable. 

It is easy to check that coefficients in all schemes considered in Sections 5 and 6 will be nonnegative 
subject to the Courant-Friedrichs-Lewy-type stability conditions |Aa| < 1 (as one would expect for the 
models based on hyperbolic PDEs [53]) and therefore, all these schemes can be interpreted in a probabilistic 
sense. Naturally, however, that satisfying those stability conditions is subject to the appropriate choice of 
the flux limiters. For example, for the scheme (IS.lSp . (I5.19P discussed in Section 5, such fluxes (71 and 74) 
should be chosen as to satisfy the following conditions 

r _ 1 _ 1 _ 

-{\v\+v 74-f'^7i)<l, -V^ + v 74 > 0, -V + t;+7i < 0. (7.1) 

ft ^ ^ 

Moreover, one has to satisfy the consistency conditions, which in the case of this scheme has the form 

r [v~i3/2 - 74) + t;+(3/2 + 71) - v] = o(r + h), (7.2) 
where, as before, denotes the Landau symbol. 



In our general case of the family (I6.2ip - (16.230 the nonnegativity of coefficients require 



Xbj-i > 0, X{bj - bj^i) > 0, 1 - A(6j+i - bj) > 0, 



X{b 



>o. 



-Xbj+2 > 0. 



(7.3) 
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Figure 5: Markov chain as a tool to combine feedforward and feedbackwards neural networks 



This results in 



A(6j+i — bj) < 1, where bj+i < bj+2 < < bj-i < bj 



(7.4) 



(see Fig. IHfor interpretation). Consistency conditions for the general scheme fl6.2ip - (16.231) can also be 
obtained by using the methodology developed in [25j. Indeed, we assume that each jump of the associated 
Markov chain is allowed within a region (e.g., rectangular) with characteristic lengths r and h such that 
T < h [ii necessary such a region can be easily refined in a way similar, for example, to the finite element 
methodology). If the Markov chain is in state x we associate this state with the position of the Markov 
chain, i.e. assume that = x. At the next moment of time r ^ r + At the Markov chain is assumed 
to be in one of the following states: x — 2h, x — h, x, x + h, or x + 2h (see Fig. [5l left). This can be 
extended to the case of any arbitrary number of states in a straightforward manner. The table of transition 
probabilities from the old to a new state of the Markov chain associated with our scheme (I6.2ip - ( I6.23P 
can be constructed as follows. 



New state (xj) 


Probability of transition (pi) 


X — h 


T/h{bj - 6j_i) 


x + h 


T/h{bj+2-bj+i) 


X 


1 - rlh{b,+, - b,) 


x-2h 


r/hbj-i 


x + 2h 


-T/hbj+2 



We can now easily relate this jump of the Markov chain to the transition to the next time layer in our 
numerical approximation of the conservation law. Therefore, we are in a position to derive the explicit form 
of the local consistency condition for the family of schemes f l6.2ip - (16.231) . Since 



where s is the number of states allowed in the next time layer (in our case s = 5), after some simplifications 
this condition can be given in the form 



s 




(7.5) 



i=l 




(7.6) 



k=j-i 



The sign is chosen to be positive due to the direction of computations (see Fig. [5], right). 



For example, consider the family of schemes discussed in Section 5 (a subclass of the general family 
( lOTj) - (Km ). Note that in this case 

J] 6fc = v+[2{l + 272) + 71 + 72] + ^^"(73 + 74) - \v\. (7.7) 
k=j-i 

Take, for example, the limiters as 71 = 72 = —1/2 and 73 = 74 = 1/2. This reduces the consistency 
condition to 

i+2 

-v^ + v'-\v\= ^ bk. (7.8) 
k=j-i 



Note also that the resulting scheme in this case has the following form 

T 
h 



?+l ? I ''"1 I 

K = Ui + - f 



3 I, 



+ ^[^^^«-2 + (7.9) 



i+2 

It is easy to conclude that if t> > (and hence —2v = bk) scheme (17. 9p is reducible to 

k=j-i 



i+2 

whereas if f < (and hence 6^ = 0) we have 

k=j-i 

The global consistency condition involves the velocity of the associated Markov chain (see, e.g. (15.141) 
and further details in [25]). Recall that for the family of schemes constructed in [25], the Markov chain 
velocity (that coincides with the velocity of the dynamic system/process when n — > 00) is determined in 
terms of the flux limiters as follows 

vmc = — = —V. (7.12) 

This result is independent of the sign of v. It shows that the Markov chain evolves in the direction opposite 
to the evolution of the dynamic system. This allows us to represent an explicit form of the global consistency 
condition for the family of schemes (16.211) - fl6.23p in the form: 



T 



[3h\v\-Tv'^] = o{T + h). (7.13) 



In conclusion, we note that, as follows from (17.61) . the Lax-Wendroff scheme's consistency condition takes 
the form 

T{bj + bj+i) = o{t + h) or Ta = o{T + h), (7.14) 
which leads to the CFL-type stability condition 

c(r + h) 

2tv 

where c = lim is a Landau constant participating in the definition of probabilistic characteristics 

T^O, h^O T + h 

of conservation laws [25]. Consistency conditions for other schemes considered in Section 6 can be obtained 
in a similar manner. 



8 Conclusions 



In this paper, a general framework for the analysis of a connection between the training of artificial neural 
networks via the dynamics of Markov chains and the approximation of conservation law models has been 
proposed. This framework allows us to demonstrate an intrinsic link between microscopic and macroscopic 
models for evolution via the concept of perturbed generalized dynamic systems. Wc have showed that 
mathematical models describing dynamics of network training can be treated effectively by using the concept 
of perturbed Markov chains associated with the original dynamic system. We have developed a general 
methodology allowing us to derive computational models for the dynamics of network training and to obtain 
constructive algorithms, as well as stability conditions for numerical approximations of conservation laws. 
We have demonstrated how this methodology can be applied to numerical approximations of conservation 
laws viewed here as Markov chain network training procedures. Our main results have been exemplified 
with several illustrative examples for which we have explicitly derived stability and consistency conditions. 
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